{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第三周作业： 在 Rental Listing Inquiries 数据上练习 xgboost 参数调优"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "问题描述\n",
    "采用xgboost模型完成商品分类（需进行参数调优）。\n",
    "\n",
    "解题提示\n",
    "\n",
    "为减轻大家对特征工程的入手难度，以及统一标准，数据请用课程网站提供的特征工程编码后的数据（RentListingInquries_FE_train.csv）或稀疏编码的形式（RentListingInquries_FE_train.bin）。xgboost既可以单独调用，也可以在sklearn框架下调用。大家可以随意选择。若采用xgboost单独调用使用方式，建议读取稀疏格式文件。\n",
    "\n",
    "批改标准\n",
    "\n",
    "独立调用xgboost或在sklearn框架下调用均可。\n",
    "\n",
    "1.模型训练：超参数调优\n",
    "\n",
    "a) 初步确定弱学习器数目： 20分\n",
    "\n",
    "b) 对树的最大深度（可选）和min_children_weight进行调优（可选）：20分\n",
    "\n",
    "c) 对正则参数进行调优：20分\n",
    "\n",
    "d) 重新调整弱学习器数目：10分\n",
    "\n",
    "e) 行列重采样参数调整：10分\n",
    "\n",
    "2.调用模型进行测试10分\n",
    "\n",
    "3.生成测试结果文件10分"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## （1） 导入相应数据包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 载入库\n",
    "import pandas as pd \n",
    "import numpy as np\n",
    "\n",
    "from xgboost.sklearn import XGBClassifier\n",
    "import xgboost as xgb\n",
    "\n",
    "from sklearn import metrics\n",
    "from sklearn.metrics import log_loss\n",
    "from sklearn.metrics import *\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "from sklearn.model_selection import StratifiedKFold\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## （2） 读取相应的数据源文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "老师给我们提供了已经进行特征处理过的文件，我们直接读取就可以了，不用在进行相应的特征处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#读入CSV文件\n",
    "data_train =pd.read_csv('RentListingInquries_FE_train.csv')\n",
    "data_test = pd.read_csv('RentListingInquries_FE_test.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#由于机器的原因，只选取1万5千条数据\n",
    "data_train_new = data_train[30000:45000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(15000, 228)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#查看数据情况\n",
    "data_train_new.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一共1万5千条数据，227个特征值，1个目标值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## （3） 分离目标值以及特征值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "x_train = data_train_new.drop(columns=['interest_level'])\n",
    "y_train = data_train_new['interest_level']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  （4） 初步确定弱学习器数目"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "#设置交叉验证的次数\n",
    "kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "#对 xgboost 进行配置\n",
    "xgb1 = XGBClassifier(\n",
    "        learning_rate =0.1,\n",
    "        n_estimators=1000,  #数值大没关系，cv会自动返回合适的n_estimators\n",
    "        max_depth=5,\n",
    "        min_child_weight=1,\n",
    "        gamma=0,\n",
    "        subsample=0.8,\n",
    "        colsample_bytree=0.8,\n",
    "        colsample_bylevel=0.7,\n",
    "        objective= 'multi:softprob',\n",
    "        seed=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "xgb_param = xgb1.get_xgb_params()\n",
    "#三中分类的问题，把num_class 设置为3\n",
    "xgb_param['num_class'] = 3\n",
    "#通过Dmatrix创建xgboost 需要的数据格式\n",
    "xgtrain = xgb.DMatrix(x_train, label=y_train) \n",
    "#通过xgboost 中内置的交叉验证选择初始的若学习期数目\n",
    "cvresult_round1 = xgb.cv(xgb_param, xgtrain, num_boost_round=xgb1.get_params()['n_estimators'], folds =kfold,metrics='mlogloss', early_stopping_rounds=50)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将结果保存起来\n",
    "cvresult_round1.to_csv('xgBoostRentalCV_Result1.csv', index_label = 'n_estimators')  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEXCAYAAABCjVgAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3XecHVX9//HXZ3t63bTdNJJACCGEEFqoUpQiVUSagiCIigKKCop+lZ8odlEQRAREpAmIAcFGVWlJMIGEELLpm96zSbbv5/fHmd3c3dxNbsrdubv3/Xww7J2ZMzOfnZu9nzvnnDlj7o6IiAhATtwBiIhI5lBSEBGRJkoKIiLSRElBRESaKCmIiEgTJQUREWmipCCSwMy+YWb3xh2HSFyUFNoZM+tqZgvN7KKEZd3MbLGZnZewbKKZPWtm681sg5m9Z2a3mlmvaP1lZlZvZpujab6ZfS7NsR9vZuXpPMauSBaPu3/f3T+TpuMtNLOT0rHvdGir96u9nZeOTkmhnXH3zcBVwO1mVhwt/hEw1d2fADCzScDLwH+B0e7eEzgFqAMOStjd6+7e1d27AucBPzKzg9vmN5FdYWZ5cccgWcLdNbXDCXgAeAQ4HlgLDExY9x/gVzvZ/jLgPy2WvQVclDB/JjAL2EBIMvsnrNs/WrYhKnNmwrrTgPeACmApcAPQBagEGoDN0TSold/rTuCv0fZvAiNSOB+jgX8C64A5wPm7Ew/wHeChaLthgAOfBpYA64GrgUOBd6Lf/Y6E44wAXozejzXAH4Ge0bo/RMeqjI71tRTO8ULg69GxqoG8aH5p9LvMAU5Mci6OAFYAuQnLzgHeiV4fBkwFNgErgZ+1ck6PB8pbWdcDeBBYDSwCbgZyonW5wE+jc7AAuCY6j3mt7GshcFIr664EyqL3dXLjvxnAgJ8Dq4CN0Tka29r7Hfffa3uaYg9A026+cdALWB794X06YXkXoB44fifbX0ZCUog+6DYA+0bz+wJbgJOBfOBr0R9nQTRfBnwjmj8h+gPcL9p2OXBMQpwTotetfsgkxPFA9AFwWPQh+Efg0Z1s04Xwof3paJsJ0Xk5YFfjIXlSuBsoAj4MVAFPA/2AkuhD6bio/MjofBUCxcCrwC8S9t3sw29H5zih/HRgMNAJ2C/6PQclxJc0YQLzgJMT5v8E3Bi9fh34ZPS6K3BEK/to9f0iJIS/AN2iOD4ArojWXU34UC6Nzve/2I2kEP27WhO9n4XAr4BXo3UfAaYBPQkJYn+iL0atvd+aUptUfdROuft6wjfMzsBTCat6EaoFVzQuMLMfRe0KW8zs5oSyR0TLNxOuEv4AzI3WfQL4q7v/091rgZ8QPpgmEb6JdgVuc/cad38ReBa4MNq2FhhjZt3dfb27v72Lv95T7v6Wu9cRksL4nZT/KLDQ3e9397roeE8SqsT2Rjz/z92r3P0fhA/xR9x9lbsvBf4NHAzg7mXR+ap299XAz4DjdrDfHZ3jRr909yXuXklI9oXR75Lv7gvdfV4r+36E6P0ws26Eb8+PJJyPkWbW1903u/sbu3IyzCw3iv0md69w94WEK4NPRkXOB2539/Lo3+ltu7L/BBcD97n72+5eDdwEHGlmw6LfoRvhCtHcfba7L0/4/fbk/c5qSgrtlJldQviG9i/ghwmr1hOqKQY2LnD3r3loV/gz4Zt0ozfcvaeHNoUBwAHA96N1gwjVAo37aCB8Sy2J1i2JljVaFK0D+BjhQ2iRmb1iZkfu4q+3IuH1VkIC2pGhwOFRgttgZhsIHygD9lI8KxNeVyaZ7wpgZv3M7FEzW2pmm4CHgL472O+OznGjJQnry4DrCFczq6JjDWpl3w8D55pZIXAu8La7Nx7rCsJVyvtmNsXMPrqDGJPpS7hCXJSwLPH9H5QYd4vXu6Ll+dlMqJorib6I3EGoalxpZveYWfeo6J6+31lNSaEdMrN+hPrUK4HPAueb2bEA7r6FUA9/7q7s091XEr5dnxEtWkb4sG08phGqMZZG6wabWeK/nyHROtx9irufRahieRp4vPEwuxLTLlgCvBIluMapq7t/ro3j+UG0z3Hu3h24hFC10ajl8XZ0jpNu4+4Pu/vR0XZO8y8EieXeI3ygngpcREgSjevmuvuFhPPxQ+AJM+uS+q/JGsK38aEJy5ref0L1TWnCusG7sO9ELc9PF6AP2/6d/dLdDyF8mdkX+Gq0vLX3W1KgpNA+3QE87e4vRZfMXwN+G30rJJq/3MxujBIIZlYKDG9th2bWh9AYOSta9DhwupmdaGb5wFcIjZ2vEZLOFuBrZpZvZscTksmjZlZgZhebWY+oSmQTodoDwjfsPmbWYy+dh0bPAvua2SejePLN7FAz27+N4+lGaETeYGYlRB9SCVYC+yTM7+gcb8fM9jOzE6L3uYpwlVKfrGzkYeBLwLGENoXG/VxiZsXRlcmGaHGr+zGzosSJcCX6OHBr1B16KPBlwpVR4+91rZmVmFlPQuP4zuS3OE5eFP+nzWx89Dt/H3jT3RdG7+/h0XnbEp2P+p2835KKuBs1NO3aBJxN+AbVs8XyF4BbE+YPB54j/NFvAGYCtwJ9ovWXEf5YGnverCLUOfdL2Mc5hAbDjcArRA230boDomUbozLnRMsLgL8RqrE2AVOAoxO2u49QBbCB1nsffS9h/nh20jgdlduP0GNpdbT/FwltEbsUD8kbmvMSypeT0IhP+CC8OeGcTIvO53TCh3x5QtmzgMXRsW5I4RwvpHnD9DhC208FoTH+2WTnMKH8EMIH+F9bLH8oer83E74EnN3K9sdHv3/LaSSh7eqh6HwvAb7Ntt5HeYQr2bWE3kfXE64srJXjLExyjO9F664mNJo3/r6l0fITCT2ONrOtp1fXnb3fmnY+WXSCRUTSwsxOBe5296E7LSyxU/WRiOxVZtbJzE4zs7yoGu3/CJ0cpB3QlYK0C2Z2DPB8snUeek9JhjCzzoSqsNGEdo+/Ate6+6ZYA5OUKCmIiEgTVR+JiEiTdjfIVt++fX3YsGFxhyEi0q5MmzZtjbsX76xcu0sKw4YNY+rUqXGHISLSrpjZop2XUvWRiIgkUFIQEZEmSgoiItJESUFERJooKYiISBMlBRERaaKkICIiTbImKZSv38o/Zq3YeUERkSyWNUnhp3fdzdpHrqaisibuUEREMlbWJIUr9q/jwryXWL68PO5QREQyVtYkhS7F4UmU65eWxRyJiEjmypqk0KtkBABbVi2IORIRkcyVNUmhx4DwvPT6dSmNCSUikpWyJilYp55spgu5FWpTEBFpTdYkBYB1+QPovHVZ3GGIiGSsrEoKWzsPpE/tCvQIUhGR5LIqKdR1G8wA1rBxq+5VEBFJJquSQl7vIXSzSpat1J3NIiLJZFVS6NxP9yqIiOxI2pKCmd1nZqvMbGYr683MfmlmZWb2jplNSFcsjXqVjARgs+5VEBFJKp1XCg8Ap+xg/anAqGi6CrgrjbEA0K1fuFehQfcqiIgklbak4O6vAut2UOQs4EEP3gB6mtnAdMUDQOfeVFFIzibdqyAikkycbQolwJKE+fJo2XbM7Cozm2pmU1evXr37RzRjXf4AulTqXgURkWTiTAqWZFnSGwjc/R53n+juE4uLi/fooFs6DaK37lUQEUkqzqRQDgxOmC8F0v4Vvr57KQNZw5rNuldBRKSlOJPCZOBTUS+kI4CN7r483QfN7zOUXraZxStWpvtQIiLtTl66dmxmjwDHA33NrBz4PyAfwN3vBp4DTgPKgK3Ap9MVS6LuA0fBDFizeA6MGtIWhxQRaTfSlhTc/cKdrHfgC+k6fmt6Dx4NwJblHwAnt/XhRUQyWlbd0QyQ2zc8bId18+MNREQkA2VdUqCwGxtze9GpYmHckYiIZJzsSwrApk6D6VO9lPoGdUsVEUmUlUmhrudwhtpylq6vjDsUEZGMkpVJIa94JP1tAwtXrIo7FBGRjJKVSaFHSeiBtG7J+zFHIiKSWbIyKXQbNAqAqpVzY45ERCSzZGVSsN6hW6qtVbdUEZFEWZkUKOpORW4vOm9ZHHckIiIZJTuTAlDRZQj9asupqq2POxQRkYyRtUnh3a19GGormbd6c9yhiIhkjKxNChMOnsgAW8/8peqWKiLSKGuTQq/5zwCwZvHsmCMREckcWZsU8s6/D4Ca5UoKIiKNsjYp0GcE9eRQuF73KoiINMrepJBXyKZOg+lfs5CKqtq4oxERyQjZmxSAmt77MsqW8sFK9UASEYEsTwpFg8YwzFZQtmxt3KGIiGSErE4K3UrHkmcNrF38XtyhiIhkhKxOCjn9wmiptSvUA0lEBLI8KdB3FA0YRRvUA0lEBLI9KeR3YnOnUkrqFrNmc3Xc0YiIxC67kwJQF/VAmr18U9yhiIjELuuTQufSAxhuy5m1ZE3coYiIxC6tScHMTjGzOWZWZmY3Jlk/1MxeMLN3zOxlMytNZzzJFA0cQ4HVs3KhGptFRNKWFMwsF7gTOBUYA1xoZmNaFPsJ8KC7jwNuAX6Qrnha1T+EVL98ZpsfWkQk06TzSuEwoMzd57t7DfAocFaLMmOAF6LXLyVZn37Fo6m3PAZUzmX9lpo2P7yISCZJZ1IoAZYkzJdHyxLNAD4WvT4H6GZmfVruyMyuMrOpZjZ19erVezfKvEIqe45ijC1i5rKNe3ffIiLtTDqTgiVZ5i3mbwCOM7P/AccBS4G67TZyv8fdJ7r7xOLi4r0eaEHJQYzJWcS7S5UURCS7pTMplAODE+ZLgWWJBdx9mbuf6+4HA9+MlrX5J3NByUH0sw0sXrSgrQ8tIpJR0pkUpgCjzGy4mRUAFwCTEwuYWV8za4zhJuC+NMbTuoHjAKhdNiOWw4uIZIq0JQV3rwOuAf4OzAYed/dZZnaLmZ0ZFTsemGNmHwD9gVvTFc8O9R8LQL/NamwWkeyWl86du/tzwHMtln074fUTwBPpjCElnXpS1aWUMZsWcsE9b/D364+NOyIRkVhk/R3NjXJLDmKMLeKUsQPiDkVEJDZKCpH8QQcxPGcFMxcs23lhEZEOSkmh0YADycGpWvgWdfUNcUcjIhILJYVGgw4GYF9bzPsrKmIORkQkHkoKjboPpK7rIMbnzGPaovVxRyMiEgslhQS5gydySO48piopiEiWUlJIYKUTKWUl8xbozmYRyU5KColKJgIwYPMslm2ojDkYEZG2p6SQaNB43HIZn1PGJ3/3ZtzRiIi0ubTe0dzuFHSBfvszceV8lg3rHXc0IiJtTlcKLVjpRMbbPF4rWxV3KCIibU5JoaWSiXT2LeRvmM+SdVvjjkZEpE0pKbRUeigAh+R8wGvz1sQcjIhI21JSaKl4P7xzH44tmMt/y9bGHY2ISJtSUmjJDBs6iSNzZ/PavLW4t3yCqIhIx6WkkMzQo+lTt5LCzeXMXbU57mhERNqMkkIyw44C4LCc93n1g9UxByMi0naUFJLpdwAU9eDDXcp4Yba6popI9lBSSCYnB4ZM4vDc2UxZuI6NlbVxRyQi0iaUFFoz7Ch6V5XTu2Edr6gKSUSyhJJCa4aGdoWT8qbzwuyVMQcjItI2lBRaM/Ag6NSLc/os4eU5q/WIThHJCkoKrcnJhX0+xIHV09hYWaMH74hIVkhrUjCzU8xsjpmVmdmNSdYPMbOXzOx/ZvaOmZ2Wznh22ciTKKpazbi8cv42c0Xc0YiIpF3akoKZ5QJ3AqcCY4ALzWxMi2I3A4+7+8HABcCv0xXPbhl5IgCX9ZvLc+8up75BdzeLSMeWziuFw4Ayd5/v7jXAo8BZLco40D163QNYlsZ4dl23AdB/LMfmvsuqimreWrAu7ohERNJqp0nBzEaYWWH0+ngz+5KZ9Uxh3yXAkoT58mhZou8Al5hZOfAc8MWUom5LI0+kz9q36ZNfy7PvZFbOEhHZ21K5UngSqDezkcDvgOHAwylsZ0mWtax/uRB4wN1LgdOAP5jZdjGZ2VVmNtXMpq5e3cb3DIw8CWuo5aoh5Tw/cwW16oUkIh1YKkmhwd3rgHOAX7j79cDAFLYrBwYnzJeyffXQFcDjAO7+OlAE9G25I3e/x90nuvvE4uLiFA69Fw0+AiyX09bcz7otNbw2T8Npi0jHlUpSqDWzC4FLgWejZfkpbDcFGGVmw82sgNCQPLlFmcXAiQBmtj8hKWTW7cN5BTDmLErzNtKzKIen3i6POyIRkbRJJSl8GjgSuNXdF5jZcOChnW0UXV1cA/wdmE3oZTTLzG4xszOjYl8BrjSzGcAjwGWeiQ8wGH06tmU1nx+1gednrmDjVo2FJCIdk+3KZ7CZ9QIGu/s76QtpxyZOnOhTp05t24NWbYQfjWD12Ms59K3j+O6ZB3DppGFtG4OIyB4ws2nuPnFn5VLpffSymXU3s97ADOB+M/vZ3giy3SjqAcOPobj8X4wd1I1HpyzRE9lEpENKpfqoh7tvAs4F7nf3Q4CT0htWBhp9Oqybx5Vj6pm9fBMzl26KOyIRkb0ulaSQZ2YDgfPZ1tCcffYLI3B85J3ryTG4/IEpMQckIrL3pZIUbiE0Fs9z9ylmtg8wN71hZaDug2DwERQVFvGJQwezqaqWdVtq4o5KRGSv2mlScPc/ufs4d/9cND/f3T+W/tAy0IHnwerZXD26iuq6Bh5+c1HcEYmI7FWpNDSXmtmfzWyVma00syfNrLQtgss4B5wDlsvQZc9x7L7FPPj6ImrqdIeziHQcqVQf3U+46WwQYeyiZ6Jl2adLXxjxIXj3SS6fNJRVFdX89V2NhyQiHUcqSaHY3e9397poegBo47EmMsiBH4eNizmu8wL27d+VO1+apyG1RaTDSCUprDGzS8wsN5ouAbJ3AKDRp4PlYI9fyrUn7kvZqs0aPVVEOoxUksLlhO6oK4DlwHmEoS+yU2E3OPB8qNnCqft2Y/SAbvziX3P1DGcR6RBS6X202N3PdPdid+/n7mcTbmTLXodcCjUV5Mx+mutP3pcFa7bw5/8tjTsqEZE9trtPXvvyXo2ivRlyJPTdF6b9ng+P6c+BJT34+T8/oKq2Pu7IRET2yO4mhWQP0MkeZjDhUih/C1s1m2+ctj/LNlbxu/8siDsyEZE9srtJQd1tDroQMHjoXI4c0YeTx/Tn1y+VsbqiOu7IRER2W6tJwcwqzGxTkqmCcM9CduvSBw66AKo2QeV6bjp1NNV1Dfzsn3PijkxEZLe1mhTcvZu7d08ydXP3vLYMMmMd+QWo3QLTHmCf4q5cOmkYj05ZwvQlG+KOTERkt+xu9ZEADDgQhh8Hb/4G6mq47qRRFHct5Oan39UNbSLSLikp7KlJX4SK5TDrKboV5XPzR8cwc+km/qjB8kSkHVJS2FMjT4L8zvDs9dDQwBnjBtK9KI/vTJ7ForVb4o5ORGSXKCnsKTM443ao3QpznsPMeP66Y+lSmMf1j03Xnc4i0q6kMnR2sl5IS6LhtPdpiyAz3gHnQq/h8OqPwJ2Snp343tljeXvxBu58aV7c0YmIpCyVK4WfAV8lDJtdCtwA/BZ4FLgvfaG1I7l5cMxXYPkMKPsXAGeNL+Hs8YO4/YUPeOWD1TEHKCKSmlSSwinu/ht3r3D3Te5+D3Cauz8G9EpzfO3HuE9AbiH86TLw0PPo++ceyL79u/HFh99W+4KItAupJIUGMzvfzHKi6fyEdep32SivAE7/CdRshtnPANC5II97PjkRM+Mzv5/KxsramIMUEdmxVJLCxcAngVXR9EngEjPrBFyzow3N7BQzm2NmZWZ2Y5L1Pzez6dH0gZm177u+DroI+u4HL9wC9XUADOnTmbsumcDCtVu48sGpGjRPRDJaKkNnz3f3M9y9bzSd4e5l7l7p7v9pbTszywXuBE4FxgAXmtmYFvu+3t3Hu/t44FfAU3v268QsNw9O/DasnQvTH2paPGlEX356/njeWrCO6x+brhvbRCRjpdL7qDTqabTKzFaa2ZNmVprCvg8DyqKkUkNomD5rB+UvBB5JLewMNvr08CCev94AldsufM48aBDf+ugYnp+5gu8+Mwt3JQYRyTypVB/dD0wmDIJXAjwTLduZEmBJwnx5tGw7ZjYUGA68mMJ+M5sZXPoMNNTBS99vtuqKo4fz2WP34cHXF3HnS2UxBSgi0rpUkkKxu9/v7nXR9ABQnMJ2yZ650NrX4wuAJ9w9aYW7mV1lZlPNbOrq1e2ge+egg+HQz8CU38Ky6c1Wff2U0ZxzcAk/+ccH3Pb8+zSoKklEMkgqSWGNmV1iZrnRdAmwNoXtyoHBCfOlQGtPuL+AHVQdufs97j7R3ScWF6eSjzLACTeD5cIDp0PDtlyXk2P8+LxxXHz4EO5+ZR7XPTad6jo1PotIZkglKVwOnA+sAJYD5wGfTmG7KcAoMxtuZgWED/7JLQuZ2X6E+x1eTzXodqFTTzjn7tBF9Y27mq3Ky83he2eP5Wun7MfkGcu49L631F1VRDJCKr2PFrv7me5e7O793P1s4NwUtqsjdFn9OzAbeNzdZ5nZLWZ2ZkLRC4FHvSO2vI79GOx7Krz4/2Bt8+EuzIzPHz+S2y8Yz7RF6znvrtcoW1URU6AiIoHtzmexmS129yFpiGenJk6c6FOnTo3j0Ltn0zL4+Vgo6AJfXwg5udsVeW3eGq55+H9sranj/844gAsOHYxZdj8GW0T2LjOb5u4Td1Zud0dJ1SdWqroPgt77QPUmeO2XSYtMGtGXv117DAW5Odz01Lt84eG32bhV1Uki0vZ2Nyl0vKqedLpmCux/Jrx463a9kRr1617E9G9/mBtPHc0/Zq3k1Ntf5W8zV+h+BhFpU60mhVaGzN5kZhWEexYkVY3PXDCD+06B6uRtBzk5xtXHjeDJz02ia1EeVz80jQt/+wazlm1s44BFJFu1mhTcvZu7d08ydXP3vLYMskPo3BsueRLqq2Hyl5pGUk3moME9ee5Lx/D/zh7LnBUVfPRX/+HGJ99h2YbKNgxYRLKRnrzWloYfCyd8C2Y9BW/+ZodF83Jz+OQRQ3n5hg9x+VHDeWzKEo667UW+9sQM5q/e3EYBi0i22a3eR3Fqd72PWmpogB+PgMp1cPETMOrklDYrX7+V3746nwffWIQ7nH7gQD53/AjGlvRIc8Ai0hGk2vtISSEOVZvCnc5ry+DSZ6H0kJQ3XV1RzX3/XcA9r8yn3p3j9i3mymP2YdKIPuTkqFOYiCSnpJDpKlbC7QeGITA+/yb0HblLm2+srOWhNxbx839+QF2DM7xvFz42oYRTxg5kZL+uaQpaRNorJYX2YO08+N2HoaAzXPFP6DZgl3dRVVvP8zOX862nZ7G5OjzYp1N+LlceM5yPjB3AmIHddSOciCgptBtLp8EDZ4Qb3C6dHHop7ablGyv5+8wVPD9zBW8uWAfA0D6dOXF0f44e1YfDhveha6E6jolkIyWF9qTsBXjkAugzCj71NHTtt8e7XLO5mn++t5Lbnn+fTVW1uIfb0CcM7cVRI/owaWRfDh7Sk8K87YfdEJGOR0mhvZn3Ejx6EXQvgU/9BXokfR7RbqmqrWfaovX8t2wNv399IVuqw1DdOQZHjyrmqBF9OGpkX8YM7K7GapEOSkmhPbrraFj1XkgIn5oMvYen5TAbK2t5c/5aXpu3lsemLKGyNiSJ3Bxj0og+HFTak3GlPRhX2pP+3QvVJiHSASgptFdL34aHzoW8TnDRYzBwXNoPuWpTFa/NW8ubC9bxl+lL2Vqz7aE/uTlGp/xcPjou9Goa2a8ro/p3Y1CPIiULkXZESaE9WzkLHjoPKtfD2XeG5zK0oaraemYt28jMpZu486UyKmvrqayppy7h0aE5Fno5dSrI5Yqj9wnJol9XBvfuTK6qoEQyjpJCe1exEh7/FCx5A7qXwnXvJH0WQ1tat6WGslWbmbuqgrkrN/PU2+VU1tZTW7/t35A1Jov8XC6dNIyhfTozuHdnBvfqTN+uBbq6EImJkkJHUFcDz38Vpj0Aoz4M594DnXrFHdV2NlXVUrZqc9P0eNROUV3X0KxcjkFhXi6FeTkU5udw5TH7UNqrE4N6hqlPFyUNkXRRUuhIpt4Hz34Zcgvg0mdgyOFxR5SSrTV1LFlXSfn6rSxZt5W7X5lHdV1DmGobqG/xb88MCnJzKMjLoSA3h3MOLqF/9yL6dS+kf/eiaCqkc4HutRDZVUoKHU35NHjycli/EHoOhS/9L/bqpD3h7mysrKV8fSXLNoTpN6/Op7qugdr6Bmqinw1J/nnmmpGfZ0wY0os5KyrIyzFyW0w3nz6GbkV5dC3Ko1tRfnhdkKcut5K1lBQ6oqqN8Mx1Yejt4cfBOXeHx312UO5ORXUdqzZVsXJTNSujnw+8toDaugaGF3dl5tKN1Dd4s0bwHckxyMvJ2S6JbJdYrPnyOy+eQNfCPIrycynKz1VjurQ7SgodlTv87w/wzLWAwek/hQmXQk52PxrD3amsraeiqo6Kqlo2VdWxuaquaf7XL5dR3+BNCaS+xVTX4NS77+jZR80YkGNGTk74aRZ+7j+wO3NXVkTrjJxo+bafYRtrfN1yfU6LsgZ/vPIICvNy1N4ie0RJoaNbOy8khoX/hsLucOVLuzzSqmyvui4klsaE8tUnZjRLHg3uNDg0uOMJrxsawusxg7rzTvnGqJzT0EDTa/c9e7i5JUsw1jwpEf7DzKKfcPKYAbz4/kqMxnKNSYmEZeFn4nYt99N8nTUvE+WrxvmmmJv+F348etWRXHDP6xmV4B777JF84jevxx1GMzuK6bHPHrlb+1RSyAaNVw1/vxnqquCYL8OkL0JBl7gjk1bU1TdQVddAVW09n3lgyrakkphgHBoaWiSdZgko2TYh4zhEySdKQg49OuezfkvNXklM6WAtXlizdZZ8eYucYgmFmm/ffMPGconb71PchQWrt7TYPvH/rcQWJcmmRBztuLV0t6M8aC3mTty/Hy+8v2q7dX26FPD8dce2vqMdSDUpqBtHe2YGEz4Vuqv+7UZ4+Qcw7fdw4rdh3CeyvkopE+Xl5tA1N4euhXk8fc3RaT1W4zfNlt863b0peTQlimhZYuLwlq+h1XW0WJfIE158fOJgHp+6eNtyb1GmKcZtSxO3TyznLVY2W+ctykULt9se6NetiPJ1lQnrvJXto98uVsUzAAAS9klEQVSvqcz2ibjlsZPx7V4krgsLX3x/FRu21jRf57TJKMdpvVIws1OA24Fc4F53vy1JmfOB7xBO0Qx3v2hH+9SVwg4seh3+eB7UbIaBB8FHvg/D0vvBI7K3xFWFo+qjFuXSlRTMLBf4ADgZKAemABe6+3sJZUYBjwMnuPt6M+vn7qt2tF8lhZ1oaICZT8JfvgD11TD6o3DSd9XeIJLlUk0K6axfOAwoc/f57l4DPAqc1aLMlcCd7r4eYGcJQVKQkwPjPg43LoITvgVznoc7DoGnPgtryuKOTkQyXDqTQgmwJGG+PFqWaF9gXzP7r5m9EVU3bcfMrjKzqWY2dfXq1WkKt4PJ7wTH3gBfeR+OvAbefTxKDlfBmrlxRyciGSqdSSFZW3vLuqo8YBRwPHAhcK+Z9dxuI/d73H2iu08sLi7e64F2aF37wUduha/MCT2T3v0T3DERHrsEylUNJyLNpTMplAODE+ZLgWVJyvzF3WvdfQEwh5AkZG/r2g8+/D34ygdw7FdDtdK9J8L9p8EHf995lwkRyQrpTApTgFFmNtzMCoALgMktyjwNfAjAzPoSqpPmpzEm6VoMJ9wMX18EH/lBuFp4+Hy4axJMfySMzCoiWSttScHd64BrgL8Ds4HH3X2Wmd1iZmdGxf4OrDWz94CXgK+6+9p0xSQJCrvCkZ+HbyyFc34TBtp7+mr45Xh47Y7wgB8RyTq6o1kCdyj7FzxxBVRvhLyi8MS3iVdAyYQd344pIhkvE7qkSntiBqNOhpsWw2f/DUU9YcYjcO8J8IMSmHo/VG+OO0oRSTNdKUjrqjbBPcdDxXKo3QqWC+MvgvEXw5AjdPUg0o5o7CPZc0Xd4Utvh6ql8inhamHmU2EQvrwiOOYrcNAF0HNI3JGKyF6iKwXZNdWbYfZkmP5wGLYbYNgxITnsfwYU9Yg3PhFJKvaxj9JFSSGDrF8E7zwG//5pGLrbcqBTbzjzVzDyRMgrjDtCEYkoKUjbcQ/3O7z7eBiMb+vakCBGngz7nRquILr0jTtKkaympCDxqK+F+S/D3H/A2w+GKwgIvZlO+o4ShEhMlBQkfu6w4l1472mY9WdYF92svs/xcMA5MPoM6NInzghFsoaSgmSWxgQx68/wxq+bX0Gc/N3w3AddQYikjZKCZC53WPEOPHIRbF2zLUEMOhhGnBgaqUsPhdz8eOMU6UCUFKR9aEwQc/4Gr/8KqivCcssNjdQjTghJotewWMMUae+UFKR9qtwAC16BshdgxqPhkaIQbpab8KlwJTHs6DCgn4ikTElB2j/38JS4h8+Hqg1QtRG8ATAo7A7HXB+SxIADNeSGyE4oKUjHU1cNi1+Hpz8fhvau3RqW5+SHEV2HHgmlh0Hx6PCsahFporGPpOPJKwzdWb/8XpivWAHzXgxVTTOfhHceDcstF/Y5LiSIwYdCyUTotN1TXkUkCV0pSMfgHu6DWPIWlL8V2iMaryQAivcPCaL0sDDCa5+RqnKSrKLqI5HqClg6DZZMgSVvhpFeqzaEdZ37huQw5MgwDRynLrDSoan6SKSwW6hu2uf4MN/QAGvnwuI3oul1eP/ZsM5yQvnDrw7JovTQMC+SZZQUJHvk5EDxfmE65NKwrGJFSA6NSeKVH24rX9AlPFBo8OFh6jk4nrhF2pCqj0QSVVeEaqZFr8OSN2Dhf6JusEBuIRz4cRh+LJQcAr33US8naTdUfSSyOwq7hbuoR5wQ5uvrYOXM0IC98N+hh9P0h8I6yw030pVMgEETQqLoPkgN2NKuKSmI7EhuHgwaH6bDr4KGelj1Hix9G5a9De/+CRa8CkRX3N0GQl1NSC5n/zpsV9Al1l9BZFeo+khkT9VWhRFgl70dHjY0e/K2Qf4A8ruEYTmO+zoMHA/9D4D8ovjilaykLqkicdqyJnSHLZ8Cb/0WajZDQ1200iC/MxR0DcninN/AgLGQ3ynWkKVjy4ikYGanALcDucC97n5bi/WXAT8GlkaL7nD3e3e0TyUFaZfcYcNiWD4d/vaNkCSaJQq2JYpjbwhXFAMOhILO8cUsHUrsDc1mlgvcCZwMlANTzGyyu7/Xouhj7n5NuuIQyQhm0GtomMacFZa5w8bykCiWTYcpvwtjOj3/tW3b5XcObRIFXeHsu0Ki0AixkkbpbGg+DChz9/kAZvYocBbQMimIZCezcO9Dz8Hh2dUnfiskik3LQqJ4/uvhaqJqA2xZDfefErbLK9qWLPK7wAUPRd1jc+P9faRDSGdSKAGWJMyXA4cnKfcxMzsW+AC43t2XtCxgZlcBVwEMGTIkDaGKZAgz6FESptGnb1u+aTn84Ryo2RLGdKrZApXrwro7JoY7svM7hSRx1LXQfwz0Hwtd+8Xze0i7lc6kkKyzdssGjGeAR9y92syuBn4PnLDdRu73APdAaFPY24GKZLzuA+ELbzRfVlsJq+fAk5/Zliwq18M/vrmtTE5+GFK8/1joNyYki+L91VYhrUpnUigHEscFKAWWJRZw97UJs78FfoiIpCa/U7gP4ostOl5sWQMrZ8Ez10LNVqjeDG/eve3ObAhVUAVd4LCromRxAPQarju0Ja1JYQowysyGE3oXXQBclFjAzAa6+/Jo9kxgdhrjEckOXfqG50lcO33bsoZ6WL8wJIuVs2DKb8PVxcs/2FbGchLaKjqHqqiLH4PuJbpLO4uku0vqacAvCF1S73P3W83sFmCqu082sx8QkkEdsA74nLu/v6N9qkuqyF5UswVWvx8li/dgxsNhWWJX2cb2irzOUbtFJzjvd+GZFLq3ot3IiPsU0kFJQSTN3GHzKlgzB565LrRd1FaGNov66uZlG++tOObLMGBcaLPo1CueuGWHlBREZO+rrYS18+CJy6NeUJtDm0VD7bYyuQWh6qmgcxjao++ocFXRpVjVUDFSUhCRtlOxIoz/9NcboHZLaOCu3UqzDoeWG0aS3bAoXGGc/pOQLHoM1j0WbSD2O5pFJIt0GxCm62ZsW9ZQDxuXwJoyeO6GcJWRWwBb10HDSnjoY1FBi9osisLP474ekkWfEdC1v64u2piuFESk7W1ZE+6xWDcPXr4tJIy6qvCz5dVFfhHkRQ3ceZ3gnLtDwujUM7bw2yNVH4lI+9NQH8aDWlsW2i7WlsE7j0NdZfPhyAFy80OPqAM/Bn33jdouRkGPUlVHJaGkICIdS111uNdibRmsmQuv3xmSRe3WkEyaWLi62OdDYUyoPiOg94jws9ugrL1BT20KItKx5BVC8X5hAjj6uvDTPVRHrfkA1s4NVxjr5sO8F2HO8zSvjsoJbRd5nWDCJduSRe8RoU1E7RdKCiLSzplB1+IwDTuq+bqGBti0NLRdNCaL6X8MVxf//SXbJ4yo7eKQS5snjC59syZhqPpIRLJTfV3oHbVuHvz1q1FVVFXy9oumBu/oKuO4r4axonoPD8OAtIM2DLUpiIjsrvra8KS8tfNC0vjP7SFR1FWGto1mAz5bqNrK67QtcXzk1pA0eg3LmBFplRRERNKhsYfU+gXw7FcSkkVVuNLw+ublcwuiK4wiOPQz4eqi8Sqjc582q5ZSUhARaWvu4ZkW6xaEpPHCLdvuwairgvqa5uUtNySLkSdsSxSNVxg9BkPu3mv2Ve8jEZG2Zgade4ep9BA48Lzm62srQ7XU45dtu7qoq4K5/4raMZJVS0VXGUd+AUadDP32T++voCsFEZEM0NAAFcvCvRiNVxrrFsDcf4SE0VAHvUfCl6bt1u51pSAi0p7k5IS7sXuUwrCjt19fuaFNejkpKYiItAdtNNZTdt7vLSIiSSkpiIhIEyUFERFpoqQgIiJNlBRERKSJkoKIiDRRUhARkSZKCiIi0qTdDXNhZquBRbu5eV9gzV4MJx3aQ4zQPuJUjHuHYtw74o5xqLsX76xQu0sKe8LMpqYy9kec2kOM0D7iVIx7h2LcO9pDjKDqIxERSaCkICIiTbItKdwTdwApaA8xQvuIUzHuHYpx72gPMWZXm4KIiOxYtl0piIjIDigpiIhIk6xJCmZ2ipnNMbMyM7sx7ngAzGywmb1kZrPNbJaZXRst721m/zSzudHPXhkQa66Z/c/Mno3mh5vZm1GMj5lZQczx9TSzJ8zs/eh8Hplp59HMro/e55lm9oiZFWXCeTSz+8xslZnNTFiW9NxZ8Mvo7+gdM5sQY4w/jt7vd8zsz2bWM2HdTVGMc8zsI3HFmLDuBjNzM+sbzcdyHlORFUnBzHKBO4FTgTHAhWY2Jt6oAKgDvuLu+wNHAF+I4roReMHdRwEvRPNxuxaYnTD/Q+DnUYzrgStiiWqb24G/ufto4CBCrBlzHs2sBPgSMNHdxwK5wAVkxnl8ADilxbLWzt2pwKhougq4K8YY/wmMdfdxwAfATQDR39AFwAHRNr+OPgPiiBEzGwycDCxOWBzXedyprEgKwGFAmbvPd/ca4FHgrJhjwt2Xu/vb0esKwgdZCSG230fFfg+cHU+EgZmVAqcD90bzBpwAPBEViTVGM+sOHAv8DsDda9x9Axl2HgmPv+1kZnlAZ2A5GXAe3f1VYF2Lxa2du7OABz14A+hpZgPjiNHd/+HuddHsG0BpQoyPunu1uy8AygifAW0eY+TnwNeAxF49sZzHVGRLUigBliTMl0fLMoaZDQMOBt4E+rv7cgiJA+gXX2QA/ILwj7ohmu8DbEj4g4z7fO4DrAbuj6q47jWzLmTQeXT3pcBPCN8WlwMbgWlk1nlM1Nq5y9S/pcuB56PXGROjmZ0JLHX3GS1WZUyMLWVLUrAkyzKmL66ZdQWeBK5z901xx5PIzD4KrHL3aYmLkxSN83zmAROAu9z9YGALmVHl1iSqkz8LGA4MAroQqhBayph/l63ItPceM/smoSr2j42LkhRr8xjNrDPwTeDbyVYnWZYR7322JIVyYHDCfCmwLKZYmjGzfEJC+KO7PxUtXtl4KRn9XBVXfMBRwJlmtpBQ7XYC4cqhZ1QNAvGfz3Kg3N3fjOafICSJTDqPJwEL3H21u9cCTwGTyKzzmKi1c5dRf0tmdinwUeBi33bTVabEOILwJWBG9PdTCrxtZgPInBi3ky1JYQowKurpUUBohJocc0yNdfO/A2a7+88SVk0GLo1eXwr8pa1ja+TuN7l7qbsPI5y3F939YuAl4LyoWNwxrgCWmNl+0aITgffIoPNIqDY6wsw6R+97Y4wZcx5baO3cTQY+FfWeOQLY2FjN1NbM7BTg68CZ7r41YdVk4AIzKzSz4YTG3LfaOj53f9fd+7n7sOjvpxyYEP17zZjzuB13z4oJOI3QQ2Ee8M2444liOppwyfgOMD2aTiPU2b8AzI1+9o471ije44Fno9f7EP7QyoA/AYUxxzYemBqdy6eBXpl2HoHvAu8DM4E/AIWZcB6BRwjtHLWED64rWjt3hGqPO6O/o3cJvaniirGMUC/f+Ldzd0L5b0YxzgFOjSvGFusXAn3jPI+pTBrmQkREmmRL9ZGIiKRASUFERJooKYiISBMlBRERaaKkICIiTZQURESkiZKCSArMbLyZnZYwf6btpSHYzey6aEgEkdjpPgWRFJjZZYQbjK5Jw74XRvteswvb5Lp7/d6ORURXCtKhmNkwCw/Z+W30QJt/mFmnVsqOMLO/mdk0M/u3mY2Oln/cwoNwZpjZq9HQKLcAnzCz6Wb2CTO7zMzuiMo/YGZ3WXhg0nwzOy564MpsM3sg4Xh3mdnUKK7vRsu+RBgg7yUzeyladqGZvRvF8MOE7Teb2S1m9iZwpJndZmbvRQ9p+Ul6zqhknbhvqdakaW9OwDDCiJnjo/nHgUtaKfsCMCp6fThhXCcIww6URK97Rj8vA+5I2LZpnvBwlUcJQxecBWwCDiR86ZqWEEvjUBG5wMvAuGh+IduGPxhEGCepmDD664vA2dE6B85v3BdhCAdLjFOTpj2ddKUgHdECd58evZ5GSBTNRMOVTwL+ZGbTgd8AjQ85+S/wgJldSfgAT8Uz7u6EhLLSw2BoDcCshOOfb2ZvA/8jPBUs2dP/DgVe9jCaauNw0MdG6+oJI+pCSDxVwL1mdi6wdbs9ieyGvJ0XEWl3qhNe1wPJqo9yCA+4Gd9yhbtfbWaHE542N93Mtiuzg2M2tDh+A5AXjdZ5A3Cou6+PqpWKkuwn2Tj7jao8akdw9zozO4ww2uoFwDWEYc1F9oiuFCQreXiY0QIz+zg0PUj9oOj1CHd/092/DawhjHtfAXTbg0N2Jzz8Z6OZ9af5A3YS9/0mcJyZ9Y2eK3wh8ErLnUVXOj3c/TngOsIosSJ7TFcKks0uBu4ys5uBfEK7wAzgx2Y2ivCt/YVo2WLgxqiq6Qe7eiB3n2Fm/yNUJ80nVFE1ugd43syWu/uHzOwmwnMWDHjO3ZM9Y6Eb8BczK4rKXb+rMYkkoy6pIiLSRNVHIiLSRNVH0uGZ2Z2EZ00nut3d748jHpFMpuojERFpouojERFpoqQgIiJNlBRERKSJkoKIiDT5/389b7FXn3gUAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1538a3c8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 画出多类分的logloss 结果图\n",
    "test_means = cvresult_round1['test-mlogloss-mean']\n",
    "test_stds = cvresult_round1['test-mlogloss-std'] \n",
    "        \n",
    "train_means = cvresult_round1['train-mlogloss-mean']\n",
    "train_stds = cvresult_round1['train-mlogloss-std'] \n",
    "\n",
    "x_axis = range(0, cvresult_round1.shape[0])\n",
    "        \n",
    "plt.errorbar(x_axis, test_means, yerr=test_stds ,label='Test')\n",
    "plt.errorbar(x_axis, train_means, yerr=train_stds ,label='Train')\n",
    "plt.title(\"XGBoost n_estimators vs Log Loss\")\n",
    "plt.xlabel( 'n_estimators' )\n",
    "plt.ylabel( 'Log Loss' )\n",
    "plt.savefig( 'first_round_nestimators.png' )\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "149\n"
     ]
    }
   ],
   "source": [
    "#初步确认所需的弱学习器\n",
    "n_estimators = cvresult_round1.shape[0]\n",
    "print n_estimators"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "初步确认所需的弱学习器为149个"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "#重新设置弱学习器个数\n",
    "xgb1.n_estimators = n_estimators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('first term of  train logloss:', 0.48183621518737246)\n"
     ]
    }
   ],
   "source": [
    "# 根据xgb的cv后最优参数对模型进行训练\n",
    "xgb1.fit(x_train, y_train, eval_metric='mlogloss')\n",
    "        \n",
    "#P输出评价得分:\n",
    "train_predprob = xgb1.predict_proba(x_train)\n",
    "logloss = log_loss(y_train, train_predprob)\n",
    "print(\"first term of  train logloss:\",logloss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## （5） 对树的最大深度（可选）和min_children_weight进行调优（可选）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.1 最大深度粗调整"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 3, 5, 7, 9]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#最大深度list 采取1,3,5,7,9\n",
    "max_depth_list = range(1,10,2)\n",
    "# min_child_weight_list = range(1,10,2)\n",
    "param_List_1 = {'max_depth':max_depth_list}\n",
    "max_depth_list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GridSearchCV(cv=3, error_score='raise',\n",
       "       estimator=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=0.7,\n",
       "       colsample_bytree=0.8, gamma=0, learning_rate=0.1, max_delta_step=0,\n",
       "       max_depth=5, min_child_weight=1, missing=None, n_estimators=149,\n",
       "       n_jobs=1, nthread=None, objective='multi:softprob', random_state=0,\n",
       "       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=3, silent=True,\n",
       "       subsample=0.8),\n",
       "       fit_params=None, iid=True, n_jobs=-1,\n",
       "       param_grid={'max_depth': [1, 3, 5, 7, 9]}, pre_dispatch='2*n_jobs',\n",
       "       refit=True, return_train_score='warn', scoring='neg_log_loss',\n",
       "       verbose=0)"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# GridSearchCV调优\n",
    "grid1 = GridSearchCV(estimator =xgb1,param_grid = param_List_1, scoring='neg_log_loss',n_jobs=-1,cv=3)\n",
    "# 训练最优\n",
    "grid1.fit(x_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.66048, std: 0.00215, params: {'max_depth': 1},\n",
       "  mean: -0.61794, std: 0.00386, params: {'max_depth': 3},\n",
       "  mean: -0.60938, std: 0.00417, params: {'max_depth': 5},\n",
       "  mean: -0.61657, std: 0.00559, params: {'max_depth': 7},\n",
       "  mean: -0.63479, std: 0.00532, params: {'max_depth': 9}],\n",
       " {'max_depth': 5},\n",
       " -0.6093771827279978)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grid1.grid_scores_, grid1.best_params_,     grid1.best_score_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.2 最大深度细调整"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GridSearchCV(cv=4, error_score='raise',\n",
       "       estimator=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=0.7,\n",
       "       colsample_bytree=0.8, gamma=0, learning_rate=0.1, max_delta_step=0,\n",
       "       max_depth=5, min_child_weight=1, missing=None, n_estimators=149,\n",
       "       n_jobs=1, nthread=None, objective='multi:softprob', random_state=0,\n",
       "       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=3, silent=True,\n",
       "       subsample=0.8),\n",
       "       fit_params=None, iid=True, n_jobs=-1,\n",
       "       param_grid={'max_depth': [4, 5, 6]}, pre_dispatch='2*n_jobs',\n",
       "       refit=True, return_train_score='warn', scoring='neg_log_loss',\n",
       "       verbose=0)"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "xgb1.max_depth = grid1.best_params_['max_depth']\n",
    "param_List_2 = {'max_depth':[4,5,6]}\n",
    "# GridSearchCV调优\n",
    "grid2 = GridSearchCV(estimator =xgb1,param_grid = param_List_2, scoring='neg_log_loss',n_jobs=-1,cv=4)\n",
    "# 训练最优\n",
    "grid2.fit(x_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.61107, std: 0.00775, params: {'max_depth': 4},\n",
       "  mean: -0.60763, std: 0.00601, params: {'max_depth': 5},\n",
       "  mean: -0.60996, std: 0.00767, params: {'max_depth': 6}],\n",
       " {'max_depth': 5},\n",
       " -0.6076305994957608)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grid2.grid_scores_, grid2.best_params_, grid2.best_score_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据细调优发现，最大的深度还是5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.3 min_child_weight 粗调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "#设置 max_depth最佳参数\n",
    "xgb1.max_depth = grid2.best_params_['max_depth']\n",
    "#min_child_weight_list 采取1,3,5,7,9\n",
    "min_child_weight_list = range(1,10,2)\n",
    "param_List_3 = {'min_child_weight':min_child_weight_list}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "# GridSearchCV调优\n",
    "grid3= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_3, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "GridSearchCV(cv=4, error_score='raise',\n",
       "       estimator=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=0.7,\n",
       "       colsample_bytree=0.8, gamma=0, learning_rate=0.1, max_delta_step=0,\n",
       "       max_depth=5, min_child_weight=1, missing=None, n_estimators=149,\n",
       "       n_jobs=1, nthread=None, objective='multi:softprob', random_state=0,\n",
       "       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=3, silent=True,\n",
       "       subsample=0.8),\n",
       "       fit_params=None, iid=True, n_jobs=-1,\n",
       "       param_grid={'min_child_weight': [1, 3, 5, 7, 9]},\n",
       "       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',\n",
       "       scoring='neg_log_loss', verbose=0)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "grid3.fit(x_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.60763, std: 0.00601, params: {'min_child_weight': 1},\n",
       "  mean: -0.60755, std: 0.00608, params: {'min_child_weight': 3},\n",
       "  mean: -0.60753, std: 0.00571, params: {'min_child_weight': 5},\n",
       "  mean: -0.60794, std: 0.00640, params: {'min_child_weight': 7},\n",
       "  mean: -0.60857, std: 0.00678, params: {'min_child_weight': 9}],\n",
       " {'min_child_weight': 5},\n",
       " -0.6075300043947762)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 输出调优结果\n",
    "grid3.grid_scores_, grid3.best_params_, grid3.best_score_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5.4 min_child_weight 细调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "#设置 max_depth最佳参数\n",
    "xgb1.min_child_weight = grid3.best_params_['min_child_weight']\n",
    "#min_child_weight_list 采取1,3,5,7,9\n",
    "min_child_weight_list1 = [4,5,6]\n",
    "param_List_4 = {'min_child_weight':min_child_weight_list1}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.60692, std: 0.00668, params: {'min_child_weight': 4},\n",
       "  mean: -0.60753, std: 0.00571, params: {'min_child_weight': 5},\n",
       "  mean: -0.60777, std: 0.00702, params: {'min_child_weight': 6}],\n",
       " {'min_child_weight': 4},\n",
       " -0.6069152105113724)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# GridSearchCV调优\n",
    "grid4= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_4, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid4.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid4.grid_scores_, grid4.best_params_, grid4.best_score_\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据细调优发现，min_child_weight值是4"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据grid4的最佳参数去设置xgboost的训练参数\n",
    "xgb1.min_child_weight = grid4.best_params_['min_child_weight']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6.行列重采样参数调整"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.1 subsample粗调整"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "#min_child_weight_list 采取1,3,5,7,9\n",
    "subsample_list1 = [0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]\n",
    "param_List_5 = {'subsample':subsample_list1}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.61636, std: 0.00844, params: {'subsample': 0.3},\n",
       "  mean: -0.61409, std: 0.00725, params: {'subsample': 0.4},\n",
       "  mean: -0.61139, std: 0.00785, params: {'subsample': 0.5},\n",
       "  mean: -0.61030, std: 0.00761, params: {'subsample': 0.6},\n",
       "  mean: -0.60881, std: 0.00751, params: {'subsample': 0.7},\n",
       "  mean: -0.60692, std: 0.00668, params: {'subsample': 0.8},\n",
       "  mean: -0.60649, std: 0.00749, params: {'subsample': 0.9},\n",
       "  mean: -0.60945, std: 0.00787, params: {'subsample': 1}],\n",
       " {'subsample': 0.9},\n",
       " -0.6064902164946)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# GridSearchCV调优\n",
    "grid5= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_5, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid5.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid5.grid_scores_, grid5.best_params_, grid5.best_score_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据所调整的最优参数设置 xgboost\n",
    "xgb1.subsample = grid5.best_params_['subsample']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.2 subsample细调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.60692, std: 0.00668, params: {'subsample': 0.8},\n",
       "  mean: -0.60766, std: 0.00691, params: {'subsample': 0.85},\n",
       "  mean: -0.60649, std: 0.00749, params: {'subsample': 0.9},\n",
       "  mean: -0.60780, std: 0.00725, params: {'subsample': 0.95}],\n",
       " {'subsample': 0.9},\n",
       " -0.6064902164946)"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "subsample_list2 = [0.8,0.85,0.9,0.95]\n",
    "param_List_6 = {'subsample':subsample_list2}\n",
    "\n",
    "# GridSearchCV调优\n",
    "grid6= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_6, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid6.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid6.grid_scores_, grid6.best_params_, grid6.best_score_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.3 colsample_bytree粗调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.61101, std: 0.00647, params: {'colsample_bytree': 0.3},\n",
       "  mean: -0.60861, std: 0.00658, params: {'colsample_bytree': 0.4},\n",
       "  mean: -0.60867, std: 0.00810, params: {'colsample_bytree': 0.5},\n",
       "  mean: -0.60748, std: 0.00646, params: {'colsample_bytree': 0.6},\n",
       "  mean: -0.60886, std: 0.00701, params: {'colsample_bytree': 0.7},\n",
       "  mean: -0.60649, std: 0.00749, params: {'colsample_bytree': 0.8},\n",
       "  mean: -0.60780, std: 0.00695, params: {'colsample_bytree': 0.9},\n",
       "  mean: -0.60935, std: 0.00766, params: {'colsample_bytree': 1}],\n",
       " {'colsample_bytree': 0.8},\n",
       " -0.6064902164946)"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "colsample_bytree_list1 = [0.3,0.4,0.5,0.6,0.7,0.8,0.9,1]\n",
    "param_List_7 = {'colsample_bytree':colsample_bytree_list1}\n",
    "\n",
    "# GridSearchCV调优\n",
    "grid7= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_7, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid7.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid7.grid_scores_, grid7.best_params_, grid7.best_score_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据所调整的最优参数设置 xgboost\n",
    "xgb1.colsample_bytree = grid7.best_params_['colsample_bytree']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6.4 colsample_bytree细调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.60931, std: 0.00821, params: {'colsample_bytree': 0.75},\n",
       "  mean: -0.60649, std: 0.00749, params: {'colsample_bytree': 0.8},\n",
       "  mean: -0.60814, std: 0.00640, params: {'colsample_bytree': 0.85}],\n",
       " {'colsample_bytree': 0.8},\n",
       " -0.6064902164946)"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "colsample_bytree_list2 = [0.75,0.8,0.85]\n",
    "param_List_8 = {'colsample_bytree':colsample_bytree_list2}\n",
    "\n",
    "# GridSearchCV调优\n",
    "grid8= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_8, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid8.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid8.grid_scores_, grid8.best_params_, grid8.best_score_"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "colsample_bytree 最优参数为 0.8"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据所调整的最优参数设置 xgboost\n",
    "xgb1.colsample_bytree = grid8.best_params_['colsample_bytree']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## （7） 正则项调优"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.1 alpha 粗调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.60635, std: 0.00691, params: {'reg_alpha': 0.001},\n",
       "  mean: -0.60724, std: 0.00676, params: {'reg_alpha': 0.01},\n",
       "  mean: -0.60755, std: 0.00597, params: {'reg_alpha': 0.1},\n",
       "  mean: -0.60547, std: 0.00684, params: {'reg_alpha': 1},\n",
       "  mean: -0.61181, std: 0.00634, params: {'reg_alpha': 10},\n",
       "  mean: -0.65980, std: 0.00446, params: {'reg_alpha': 100},\n",
       "  mean: -0.79226, std: 0.00186, params: {'reg_alpha': 1000}],\n",
       " {'reg_alpha': 1},\n",
       " -0.6054671792483267)"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg_alpha_List1 = [0.001,0.01,0.1,1,10,100,1000]\n",
    "param_List_9 = {'reg_alpha':reg_alpha_List1}\n",
    "\n",
    "# GridSearchCV调优\n",
    "grid9= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_9, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid9.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid9.grid_scores_, grid9.best_params_, grid9.best_score_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据所调整的最优参数设置 xgboost\n",
    "xgb1.reg_alpha = grid9.best_params_['reg_alpha']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.2 alpha 细调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.60693, std: 0.00612, params: {'reg_alpha': 0.5},\n",
       "  mean: -0.60539, std: 0.00645, params: {'reg_alpha': 0.7},\n",
       "  mean: -0.60547, std: 0.00681, params: {'reg_alpha': 0.9},\n",
       "  mean: -0.60547, std: 0.00684, params: {'reg_alpha': 1},\n",
       "  mean: -0.60638, std: 0.00698, params: {'reg_alpha': 1.5},\n",
       "  mean: -0.60637, std: 0.00625, params: {'reg_alpha': 2},\n",
       "  mean: -0.60632, std: 0.00641, params: {'reg_alpha': 2.5},\n",
       "  mean: -0.60603, std: 0.00635, params: {'reg_alpha': 3},\n",
       "  mean: -0.60645, std: 0.00658, params: {'reg_alpha': 3.5},\n",
       "  mean: -0.60802, std: 0.00672, params: {'reg_alpha': 4},\n",
       "  mean: -0.60741, std: 0.00644, params: {'reg_alpha': 4.5},\n",
       "  mean: -0.60855, std: 0.00645, params: {'reg_alpha': 5},\n",
       "  mean: -0.60842, std: 0.00655, params: {'reg_alpha': 5.5},\n",
       "  mean: -0.60853, std: 0.00681, params: {'reg_alpha': 6},\n",
       "  mean: -0.60899, std: 0.00649, params: {'reg_alpha': 6.5},\n",
       "  mean: -0.60939, std: 0.00570, params: {'reg_alpha': 7},\n",
       "  mean: -0.60942, std: 0.00617, params: {'reg_alpha': 7.5},\n",
       "  mean: -0.61033, std: 0.00648, params: {'reg_alpha': 8},\n",
       "  mean: -0.61042, std: 0.00634, params: {'reg_alpha': 8.5},\n",
       "  mean: -0.61077, std: 0.00583, params: {'reg_alpha': 9},\n",
       "  mean: -0.61116, std: 0.00672, params: {'reg_alpha': 9.5}],\n",
       " {'reg_alpha': 0.7},\n",
       " -0.6053877814600167)"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg_alpha_List2 = [0.5,0.7,0.9,1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5]\n",
    "param_List_10 = {'reg_alpha':reg_alpha_List2}\n",
    "\n",
    "# GridSearchCV调优\n",
    "grid10= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_10, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid10.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid10.grid_scores_, grid10.best_params_, grid10.best_score_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据所调整的最优参数设置 xgboost\n",
    "xgb1.reg_alpha = grid10.best_params_['reg_alpha']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.3 lambda 粗调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.60617, std: 0.00704, params: {'reg_lambda': 0.001},\n",
       "  mean: -0.60664, std: 0.00736, params: {'reg_lambda': 0.01},\n",
       "  mean: -0.60616, std: 0.00670, params: {'reg_lambda': 0.1},\n",
       "  mean: -0.60539, std: 0.00645, params: {'reg_lambda': 1},\n",
       "  mean: -0.60717, std: 0.00606, params: {'reg_lambda': 10},\n",
       "  mean: -0.61321, std: 0.00622, params: {'reg_lambda': 100}],\n",
       " {'reg_lambda': 1},\n",
       " -0.6053877814600167)"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg_lambda_List1 = [0.001,0.01,0.1,1,10,100]\n",
    "param_List_11 = {'reg_lambda':reg_lambda_List1}\n",
    "\n",
    "# GridSearchCV调优\n",
    "grid11= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_11, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid11.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid11.grid_scores_, grid11.best_params_, grid11.best_score_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据所调整的最优参数设置 xgboost\n",
    "xgb1.reg_lambda = grid11.best_params_['reg_lambda']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 7.4 lambda 细调优"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\model_selection\\_search.py:761: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20\n",
      "  DeprecationWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "([mean: -0.60618, std: 0.00631, params: {'reg_lambda': 0.5},\n",
       "  mean: -0.60711, std: 0.00697, params: {'reg_lambda': 0.7},\n",
       "  mean: -0.60667, std: 0.00670, params: {'reg_lambda': 0.9},\n",
       "  mean: -0.60539, std: 0.00645, params: {'reg_lambda': 1},\n",
       "  mean: -0.60605, std: 0.00696, params: {'reg_lambda': 1.5},\n",
       "  mean: -0.60579, std: 0.00683, params: {'reg_lambda': 2},\n",
       "  mean: -0.60576, std: 0.00620, params: {'reg_lambda': 2.5},\n",
       "  mean: -0.60646, std: 0.00555, params: {'reg_lambda': 3},\n",
       "  mean: -0.60691, std: 0.00622, params: {'reg_lambda': 3.5},\n",
       "  mean: -0.60679, std: 0.00655, params: {'reg_lambda': 4},\n",
       "  mean: -0.60650, std: 0.00650, params: {'reg_lambda': 4.5},\n",
       "  mean: -0.60682, std: 0.00634, params: {'reg_lambda': 5},\n",
       "  mean: -0.60662, std: 0.00654, params: {'reg_lambda': 5.5},\n",
       "  mean: -0.60671, std: 0.00594, params: {'reg_lambda': 6},\n",
       "  mean: -0.60635, std: 0.00715, params: {'reg_lambda': 6.5},\n",
       "  mean: -0.60643, std: 0.00611, params: {'reg_lambda': 7},\n",
       "  mean: -0.60638, std: 0.00601, params: {'reg_lambda': 7.5},\n",
       "  mean: -0.60631, std: 0.00648, params: {'reg_lambda': 8},\n",
       "  mean: -0.60646, std: 0.00588, params: {'reg_lambda': 8.5},\n",
       "  mean: -0.60671, std: 0.00597, params: {'reg_lambda': 9},\n",
       "  mean: -0.60683, std: 0.00625, params: {'reg_lambda': 9.5}],\n",
       " {'reg_lambda': 1},\n",
       " -0.6053877814600167)"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg_lambda_List2 = [0.5,0.7,0.9,1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5]\n",
    "param_List_12 = {'reg_lambda':reg_lambda_List2}\n",
    "\n",
    "# GridSearchCV调优\n",
    "grid11= GridSearchCV(estimator =xgb1,\n",
    "                       param_grid = param_List_12, \n",
    "                       scoring='neg_log_loss',\n",
    "                       n_jobs=-1, \n",
    "                       cv=4)\n",
    "grid12.fit(x_train,y_train)\n",
    "# 输出调优结果\n",
    "grid12.grid_scores_, grid12.best_params_, grid12.best_score_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [],
   "source": [
    "#根据所调整的最优参数设置 xgboost\n",
    "xgb1.reg_lambda = grid12.best_params_['reg_lambda']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# （8） 查看最佳调优参数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "reg_alpha best: 0.7\n",
      "reg_lambda best: 1\n",
      "colsample_bytree best: 0.8\n",
      "subsample best: 0.9\n",
      "max_depth best: 5\n",
      "min_child_weight best: 4\n"
     ]
    }
   ],
   "source": [
    "print \"reg_alpha best:\",xgb1.reg_alpha\n",
    "print \"reg_lambda best:\",xgb1.reg_lambda\n",
    "print \"colsample_bytree best:\",xgb1.colsample_bytree\n",
    "print \"subsample best:\",xgb1.subsample\n",
    "print \"max_depth best:\",xgb1.max_depth\n",
    "print \"min_child_weight best:\",xgb1.min_child_weight\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# （9） 降低学习率，调整参数，找到最优的弱学习器个数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "149 0.1\n"
     ]
    }
   ],
   "source": [
    "print xgb1.n_estimators,xgb1.learning_rate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "之前的最佳弱学习器参数是149,学习率是0.1 现在重新调整"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "#对 xgboost 进行配置\n",
    "xgb2 = XGBClassifier(\n",
    "        learning_rate =0.01,\n",
    "        n_estimators=2000,  #数值大没关系，cv会自动返回合适的n_estimators\n",
    "        max_depth=xgb1.max_depth,\n",
    "        min_child_weight=xgb1.min_child_weight,\n",
    "        gamma=0,\n",
    "        subsample=xgb1.subsample,\n",
    "        colsample_bytree=xgb1.colsample_bytree,\n",
    "        colsample_bylevel=0.7,\n",
    "        objective= 'multi:softprob',\n",
    "        seed=3)\n",
    "\n",
    "xgb_param2 = xgb2.get_xgb_params()\n",
    "#三中分类的问题，把num_class 设置为3\n",
    "xgb_param2['num_class'] = 3\n",
    "#通过xgboost 中内置的交叉验证选择初始的若学习期数目\n",
    "cvresult_round2 = xgb.cv(xgb_param2, xgtrain, num_boost_round=xgb2.get_params()['n_estimators'], folds =kfold,metrics='mlogloss', early_stopping_rounds=50)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>test-mlogloss-mean</th>\n",
       "      <th>test-mlogloss-std</th>\n",
       "      <th>train-mlogloss-mean</th>\n",
       "      <th>train-mlogloss-std</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.092626</td>\n",
       "      <td>0.000038</td>\n",
       "      <td>1.092429</td>\n",
       "      <td>0.000038</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.086694</td>\n",
       "      <td>0.000064</td>\n",
       "      <td>1.086319</td>\n",
       "      <td>0.000065</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1.080849</td>\n",
       "      <td>0.000085</td>\n",
       "      <td>1.080284</td>\n",
       "      <td>0.000125</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1.075148</td>\n",
       "      <td>0.000074</td>\n",
       "      <td>1.074393</td>\n",
       "      <td>0.000161</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1.069515</td>\n",
       "      <td>0.000120</td>\n",
       "      <td>1.068558</td>\n",
       "      <td>0.000208</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1.063965</td>\n",
       "      <td>0.000113</td>\n",
       "      <td>1.062826</td>\n",
       "      <td>0.000219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1.058490</td>\n",
       "      <td>0.000097</td>\n",
       "      <td>1.057152</td>\n",
       "      <td>0.000219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>1.053182</td>\n",
       "      <td>0.000127</td>\n",
       "      <td>1.051654</td>\n",
       "      <td>0.000242</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>1.047893</td>\n",
       "      <td>0.000138</td>\n",
       "      <td>1.046154</td>\n",
       "      <td>0.000264</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>1.042719</td>\n",
       "      <td>0.000125</td>\n",
       "      <td>1.040798</td>\n",
       "      <td>0.000278</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1.037618</td>\n",
       "      <td>0.000168</td>\n",
       "      <td>1.035506</td>\n",
       "      <td>0.000325</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1.032598</td>\n",
       "      <td>0.000202</td>\n",
       "      <td>1.030314</td>\n",
       "      <td>0.000348</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>1.027623</td>\n",
       "      <td>0.000214</td>\n",
       "      <td>1.025141</td>\n",
       "      <td>0.000372</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>1.022738</td>\n",
       "      <td>0.000228</td>\n",
       "      <td>1.020057</td>\n",
       "      <td>0.000415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>1.017914</td>\n",
       "      <td>0.000210</td>\n",
       "      <td>1.015055</td>\n",
       "      <td>0.000409</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>1.013198</td>\n",
       "      <td>0.000264</td>\n",
       "      <td>1.010153</td>\n",
       "      <td>0.000470</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1.008553</td>\n",
       "      <td>0.000277</td>\n",
       "      <td>1.005318</td>\n",
       "      <td>0.000528</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>1.003893</td>\n",
       "      <td>0.000316</td>\n",
       "      <td>1.000488</td>\n",
       "      <td>0.000558</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>0.999396</td>\n",
       "      <td>0.000331</td>\n",
       "      <td>0.995823</td>\n",
       "      <td>0.000584</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>0.994928</td>\n",
       "      <td>0.000303</td>\n",
       "      <td>0.991180</td>\n",
       "      <td>0.000564</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>0.990533</td>\n",
       "      <td>0.000315</td>\n",
       "      <td>0.986601</td>\n",
       "      <td>0.000579</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>0.986199</td>\n",
       "      <td>0.000314</td>\n",
       "      <td>0.982076</td>\n",
       "      <td>0.000591</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>0.981933</td>\n",
       "      <td>0.000273</td>\n",
       "      <td>0.977665</td>\n",
       "      <td>0.000562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>0.977785</td>\n",
       "      <td>0.000215</td>\n",
       "      <td>0.973339</td>\n",
       "      <td>0.000494</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>0.973670</td>\n",
       "      <td>0.000221</td>\n",
       "      <td>0.969044</td>\n",
       "      <td>0.000519</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>0.969551</td>\n",
       "      <td>0.000197</td>\n",
       "      <td>0.964769</td>\n",
       "      <td>0.000486</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>0.965531</td>\n",
       "      <td>0.000184</td>\n",
       "      <td>0.960549</td>\n",
       "      <td>0.000477</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>0.961554</td>\n",
       "      <td>0.000223</td>\n",
       "      <td>0.956399</td>\n",
       "      <td>0.000479</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>0.957695</td>\n",
       "      <td>0.000299</td>\n",
       "      <td>0.952363</td>\n",
       "      <td>0.000561</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>0.953856</td>\n",
       "      <td>0.000313</td>\n",
       "      <td>0.948354</td>\n",
       "      <td>0.000591</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1688</th>\n",
       "      <td>0.608434</td>\n",
       "      <td>0.005571</td>\n",
       "      <td>0.443194</td>\n",
       "      <td>0.002957</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1689</th>\n",
       "      <td>0.608431</td>\n",
       "      <td>0.005572</td>\n",
       "      <td>0.443141</td>\n",
       "      <td>0.002964</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1690</th>\n",
       "      <td>0.608435</td>\n",
       "      <td>0.005583</td>\n",
       "      <td>0.443068</td>\n",
       "      <td>0.002954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1691</th>\n",
       "      <td>0.608429</td>\n",
       "      <td>0.005589</td>\n",
       "      <td>0.442994</td>\n",
       "      <td>0.002954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1692</th>\n",
       "      <td>0.608422</td>\n",
       "      <td>0.005593</td>\n",
       "      <td>0.442928</td>\n",
       "      <td>0.002958</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1693</th>\n",
       "      <td>0.608421</td>\n",
       "      <td>0.005604</td>\n",
       "      <td>0.442890</td>\n",
       "      <td>0.002957</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1694</th>\n",
       "      <td>0.608416</td>\n",
       "      <td>0.005612</td>\n",
       "      <td>0.442835</td>\n",
       "      <td>0.002971</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1695</th>\n",
       "      <td>0.608414</td>\n",
       "      <td>0.005619</td>\n",
       "      <td>0.442774</td>\n",
       "      <td>0.002965</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1696</th>\n",
       "      <td>0.608408</td>\n",
       "      <td>0.005613</td>\n",
       "      <td>0.442701</td>\n",
       "      <td>0.002979</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1697</th>\n",
       "      <td>0.608413</td>\n",
       "      <td>0.005620</td>\n",
       "      <td>0.442638</td>\n",
       "      <td>0.002979</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1698</th>\n",
       "      <td>0.608404</td>\n",
       "      <td>0.005612</td>\n",
       "      <td>0.442568</td>\n",
       "      <td>0.002978</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1699</th>\n",
       "      <td>0.608405</td>\n",
       "      <td>0.005616</td>\n",
       "      <td>0.442510</td>\n",
       "      <td>0.002974</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1700</th>\n",
       "      <td>0.608397</td>\n",
       "      <td>0.005608</td>\n",
       "      <td>0.442434</td>\n",
       "      <td>0.002981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1701</th>\n",
       "      <td>0.608398</td>\n",
       "      <td>0.005609</td>\n",
       "      <td>0.442368</td>\n",
       "      <td>0.002986</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1702</th>\n",
       "      <td>0.608402</td>\n",
       "      <td>0.005616</td>\n",
       "      <td>0.442323</td>\n",
       "      <td>0.002990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1703</th>\n",
       "      <td>0.608402</td>\n",
       "      <td>0.005629</td>\n",
       "      <td>0.442255</td>\n",
       "      <td>0.002984</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1704</th>\n",
       "      <td>0.608403</td>\n",
       "      <td>0.005630</td>\n",
       "      <td>0.442201</td>\n",
       "      <td>0.002992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1705</th>\n",
       "      <td>0.608395</td>\n",
       "      <td>0.005638</td>\n",
       "      <td>0.442125</td>\n",
       "      <td>0.002998</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1706</th>\n",
       "      <td>0.608390</td>\n",
       "      <td>0.005629</td>\n",
       "      <td>0.442040</td>\n",
       "      <td>0.002983</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1707</th>\n",
       "      <td>0.608391</td>\n",
       "      <td>0.005628</td>\n",
       "      <td>0.441959</td>\n",
       "      <td>0.002999</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1708</th>\n",
       "      <td>0.608390</td>\n",
       "      <td>0.005629</td>\n",
       "      <td>0.441877</td>\n",
       "      <td>0.003011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1709</th>\n",
       "      <td>0.608385</td>\n",
       "      <td>0.005620</td>\n",
       "      <td>0.441819</td>\n",
       "      <td>0.003015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1710</th>\n",
       "      <td>0.608384</td>\n",
       "      <td>0.005614</td>\n",
       "      <td>0.441740</td>\n",
       "      <td>0.002996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1711</th>\n",
       "      <td>0.608386</td>\n",
       "      <td>0.005631</td>\n",
       "      <td>0.441683</td>\n",
       "      <td>0.003009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1712</th>\n",
       "      <td>0.608394</td>\n",
       "      <td>0.005626</td>\n",
       "      <td>0.441624</td>\n",
       "      <td>0.003007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1713</th>\n",
       "      <td>0.608383</td>\n",
       "      <td>0.005635</td>\n",
       "      <td>0.441548</td>\n",
       "      <td>0.003029</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1714</th>\n",
       "      <td>0.608384</td>\n",
       "      <td>0.005632</td>\n",
       "      <td>0.441483</td>\n",
       "      <td>0.003032</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1715</th>\n",
       "      <td>0.608385</td>\n",
       "      <td>0.005633</td>\n",
       "      <td>0.441408</td>\n",
       "      <td>0.003034</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1716</th>\n",
       "      <td>0.608388</td>\n",
       "      <td>0.005639</td>\n",
       "      <td>0.441361</td>\n",
       "      <td>0.003046</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1717</th>\n",
       "      <td>0.608376</td>\n",
       "      <td>0.005639</td>\n",
       "      <td>0.441284</td>\n",
       "      <td>0.003049</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>1718 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      test-mlogloss-mean  test-mlogloss-std  train-mlogloss-mean  \\\n",
       "0               1.092626           0.000038             1.092429   \n",
       "1               1.086694           0.000064             1.086319   \n",
       "2               1.080849           0.000085             1.080284   \n",
       "3               1.075148           0.000074             1.074393   \n",
       "4               1.069515           0.000120             1.068558   \n",
       "5               1.063965           0.000113             1.062826   \n",
       "6               1.058490           0.000097             1.057152   \n",
       "7               1.053182           0.000127             1.051654   \n",
       "8               1.047893           0.000138             1.046154   \n",
       "9               1.042719           0.000125             1.040798   \n",
       "10              1.037618           0.000168             1.035506   \n",
       "11              1.032598           0.000202             1.030314   \n",
       "12              1.027623           0.000214             1.025141   \n",
       "13              1.022738           0.000228             1.020057   \n",
       "14              1.017914           0.000210             1.015055   \n",
       "15              1.013198           0.000264             1.010153   \n",
       "16              1.008553           0.000277             1.005318   \n",
       "17              1.003893           0.000316             1.000488   \n",
       "18              0.999396           0.000331             0.995823   \n",
       "19              0.994928           0.000303             0.991180   \n",
       "20              0.990533           0.000315             0.986601   \n",
       "21              0.986199           0.000314             0.982076   \n",
       "22              0.981933           0.000273             0.977665   \n",
       "23              0.977785           0.000215             0.973339   \n",
       "24              0.973670           0.000221             0.969044   \n",
       "25              0.969551           0.000197             0.964769   \n",
       "26              0.965531           0.000184             0.960549   \n",
       "27              0.961554           0.000223             0.956399   \n",
       "28              0.957695           0.000299             0.952363   \n",
       "29              0.953856           0.000313             0.948354   \n",
       "...                  ...                ...                  ...   \n",
       "1688            0.608434           0.005571             0.443194   \n",
       "1689            0.608431           0.005572             0.443141   \n",
       "1690            0.608435           0.005583             0.443068   \n",
       "1691            0.608429           0.005589             0.442994   \n",
       "1692            0.608422           0.005593             0.442928   \n",
       "1693            0.608421           0.005604             0.442890   \n",
       "1694            0.608416           0.005612             0.442835   \n",
       "1695            0.608414           0.005619             0.442774   \n",
       "1696            0.608408           0.005613             0.442701   \n",
       "1697            0.608413           0.005620             0.442638   \n",
       "1698            0.608404           0.005612             0.442568   \n",
       "1699            0.608405           0.005616             0.442510   \n",
       "1700            0.608397           0.005608             0.442434   \n",
       "1701            0.608398           0.005609             0.442368   \n",
       "1702            0.608402           0.005616             0.442323   \n",
       "1703            0.608402           0.005629             0.442255   \n",
       "1704            0.608403           0.005630             0.442201   \n",
       "1705            0.608395           0.005638             0.442125   \n",
       "1706            0.608390           0.005629             0.442040   \n",
       "1707            0.608391           0.005628             0.441959   \n",
       "1708            0.608390           0.005629             0.441877   \n",
       "1709            0.608385           0.005620             0.441819   \n",
       "1710            0.608384           0.005614             0.441740   \n",
       "1711            0.608386           0.005631             0.441683   \n",
       "1712            0.608394           0.005626             0.441624   \n",
       "1713            0.608383           0.005635             0.441548   \n",
       "1714            0.608384           0.005632             0.441483   \n",
       "1715            0.608385           0.005633             0.441408   \n",
       "1716            0.608388           0.005639             0.441361   \n",
       "1717            0.608376           0.005639             0.441284   \n",
       "\n",
       "      train-mlogloss-std  \n",
       "0               0.000038  \n",
       "1               0.000065  \n",
       "2               0.000125  \n",
       "3               0.000161  \n",
       "4               0.000208  \n",
       "5               0.000219  \n",
       "6               0.000219  \n",
       "7               0.000242  \n",
       "8               0.000264  \n",
       "9               0.000278  \n",
       "10              0.000325  \n",
       "11              0.000348  \n",
       "12              0.000372  \n",
       "13              0.000415  \n",
       "14              0.000409  \n",
       "15              0.000470  \n",
       "16              0.000528  \n",
       "17              0.000558  \n",
       "18              0.000584  \n",
       "19              0.000564  \n",
       "20              0.000579  \n",
       "21              0.000591  \n",
       "22              0.000562  \n",
       "23              0.000494  \n",
       "24              0.000519  \n",
       "25              0.000486  \n",
       "26              0.000477  \n",
       "27              0.000479  \n",
       "28              0.000561  \n",
       "29              0.000591  \n",
       "...                  ...  \n",
       "1688            0.002957  \n",
       "1689            0.002964  \n",
       "1690            0.002954  \n",
       "1691            0.002954  \n",
       "1692            0.002958  \n",
       "1693            0.002957  \n",
       "1694            0.002971  \n",
       "1695            0.002965  \n",
       "1696            0.002979  \n",
       "1697            0.002979  \n",
       "1698            0.002978  \n",
       "1699            0.002974  \n",
       "1700            0.002981  \n",
       "1701            0.002986  \n",
       "1702            0.002990  \n",
       "1703            0.002984  \n",
       "1704            0.002992  \n",
       "1705            0.002998  \n",
       "1706            0.002983  \n",
       "1707            0.002999  \n",
       "1708            0.003011  \n",
       "1709            0.003015  \n",
       "1710            0.002996  \n",
       "1711            0.003009  \n",
       "1712            0.003007  \n",
       "1713            0.003029  \n",
       "1714            0.003032  \n",
       "1715            0.003034  \n",
       "1716            0.003046  \n",
       "1717            0.003049  \n",
       "\n",
       "[1718 rows x 4 columns]"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cvresult_round2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将结果保存起来\n",
    "cvresult_round2.to_csv('xgBoostRentalCV_Result2.csv', index_label = 'n_estimators')  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.47976666222864295\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\preprocessing\\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.\n",
      "  if diff:\n"
     ]
    }
   ],
   "source": [
    "#设置最优弱学习器\n",
    "xgb2.n_estimators = cvresult_round2.shape[0]\n",
    "# 根据xgb的cv后最优参数对模型进行训练\n",
    "xgb2.fit(x_train, y_train, eval_metric='mlogloss')\n",
    "        \n",
    "#P输出评价得分:\n",
    "predict_proba2 = xgb2.predict_proba(x_train)\n",
    "predict_result = xgb2.predict(x_train)\n",
    "logloss2 = log_loss(y_train, predict_proba2)\n",
    "print logloss2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('Accuracy :', 0.8057333333333333)\n"
     ]
    }
   ],
   "source": [
    "print( \"Accuracy :\" ,metrics.accuracy_score(y_train, predict_result))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## （10） 预测并保存预测结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Anaconda\\lib\\site-packages\\sklearn\\preprocessing\\label.py:151: DeprecationWarning: The truth value of an empty array is ambiguous. Returning False, but in future this will result in an error. Use `array.size > 0` to check that an array is not empty.\n",
      "  if diff:\n"
     ]
    }
   ],
   "source": [
    "predict_test_result = xgb2.predict(data_test)\n",
    "predict_test_probability =  xgb2.predict_proba(data_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_test =pd.DataFrame(predict_test_result,columns=[\"interest_level\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_test[\"interest_level\"] = y_test[\"interest_level\"].map({0: 'low', 1: 'medium', 2: 'high'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>interest_level</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74629</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74630</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74631</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74632</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74633</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74634</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74635</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74636</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74637</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74638</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74639</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74640</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74641</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74642</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74643</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74644</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74645</th>\n",
       "      <td>low</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74646</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74647</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74648</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74649</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74650</th>\n",
       "      <td>medium</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74651</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74652</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74653</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74654</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74655</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74656</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74657</th>\n",
       "      <td>low</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>74658</th>\n",
       "      <td>high</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>74659 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      interest_level\n",
       "0               high\n",
       "1               high\n",
       "2               high\n",
       "3             medium\n",
       "4               high\n",
       "5               high\n",
       "6               high\n",
       "7             medium\n",
       "8               high\n",
       "9               high\n",
       "10              high\n",
       "11            medium\n",
       "12              high\n",
       "13              high\n",
       "14              high\n",
       "15              high\n",
       "16            medium\n",
       "17              high\n",
       "18              high\n",
       "19            medium\n",
       "20              high\n",
       "21              high\n",
       "22              high\n",
       "23            medium\n",
       "24              high\n",
       "25            medium\n",
       "26              high\n",
       "27              high\n",
       "28              high\n",
       "29              high\n",
       "...              ...\n",
       "74629         medium\n",
       "74630         medium\n",
       "74631           high\n",
       "74632           high\n",
       "74633           high\n",
       "74634           high\n",
       "74635           high\n",
       "74636           high\n",
       "74637           high\n",
       "74638           high\n",
       "74639           high\n",
       "74640           high\n",
       "74641           high\n",
       "74642           high\n",
       "74643           high\n",
       "74644           high\n",
       "74645            low\n",
       "74646           high\n",
       "74647           high\n",
       "74648           high\n",
       "74649         medium\n",
       "74650         medium\n",
       "74651           high\n",
       "74652           high\n",
       "74653           high\n",
       "74654           high\n",
       "74655           high\n",
       "74656           high\n",
       "74657            low\n",
       "74658           high\n",
       "\n",
       "[74659 rows x 1 columns]"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 保存预测结果\n",
    "y_test.to_csv(\"rent_ListForPredict.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_probability =pd.DataFrame(predict_test_probability,columns=[\"low\", \"medium\", \"high\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>low</th>\n",
       "      <th>medium</th>\n",
       "      <th>high</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.062218</td>\n",
       "      <td>0.389764</td>\n",
       "      <td>0.548019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.166255</td>\n",
       "      <td>0.314202</td>\n",
       "      <td>0.519543</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.015784</td>\n",
       "      <td>0.089169</td>\n",
       "      <td>0.895048</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.090464</td>\n",
       "      <td>0.482632</td>\n",
       "      <td>0.426904</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.050954</td>\n",
       "      <td>0.270072</td>\n",
       "      <td>0.678974</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        low    medium      high\n",
       "0  0.062218  0.389764  0.548019\n",
       "1  0.166255  0.314202  0.519543\n",
       "2  0.015784  0.089169  0.895048\n",
       "3  0.090464  0.482632  0.426904\n",
       "4  0.050954  0.270072  0.678974"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_probability.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 保存预测结果\n",
    "y_probability.to_csv(\"rent_ListForPredict_prob.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
